Subset Selection Algorithms: Randomized vs. Deterministic
نویسندگان
چکیده
Abstract. Subset selection is a method for selecting a subset of columns from a real matrix, so that the subset represents the entire matrix well and is far from being rank deficient. We begin by extending a deterministic subset selection algorithm to matrices that have more columns than rows. Then we investigate a two-stage subset selection algorithm that utilizes a randomized stage to pick a smaller number of candidate columns, which are forwarded for to the deterministic stage for subset selection. We perform extensive numerical experiments to compare the accuracy of this algorithm with the best known deterministic algorithm. We also introduce an iterative algorithm that systematically determines the number of candidate columns picked in the randomized stage, and we provide a recommendation for a specific value. Motivated by our experimental results, we propose a new two stage deterministic algorithm for subset selection. In our numerical experiments, this new algorithm appears to be as accurate as the best deterministic algorithm, but it is faster, and it is also easier to implement than the randomized algorithm.
منابع مشابه
Ridge Regression and Provable Deterministic Ridge Leverage Score Sampling
Ridge leverage scores provide a balance between low-rank approximation and regularization, and are ubiquitous in randomized linear algebra and machine learning. Deterministic algorithms are also of interest in the moderately big data regime, because deterministic algorithms provide interpretability to the practitioner by having no failure probability and always returning the same results. We pr...
متن کاملFaster Subset Selection for Matrices and Applications
We study the following problem of subset selection for matrices: given a matrix X ∈ Rn×m (m > n) and a sampling parameter k (n ≤ k ≤ m), select a subset of k columns from X such that the pseudoinverse of the sampled matrix has as small a norm as possible. In this work, we focus on the Frobenius and the spectral matrix norms. We describe several novel (deterministic and randomized) approximation...
متن کاملPractical Algorithms for Selection on Coarse-Grained Parallel Computers
In this paper, we consider the problem of selection on coarse-grained distributed memory parallel computers. We discuss several deterministic and randomized algorithms for parallel selection. Experimental results on the CM5 demonstrate that randomized algorithms are superior to their deterministic counterparts.
متن کامل.1 Error
Randomized algorithms have an additional primitive operation that deterministic algorithms do not have. We can select a number from a range [1 . . .x] uniformly at random, at a cost assumed to be linearly dependent on the size of x in binary representation. The algorithm then makes a decision based on the outcome of this random selection. We first look at some defining characteristics of random...
متن کاملTree Pattern Matching to Subset Matching in Linear Time
In this paper, we show an O(n + m) time Turing reduction from the tree pattern matching problem to another problem called the subset matching problem. Subsequent works have given efficient deterministic and randomized algorithms for the subset matching problem. Together, these works yield an O ( n log m+m ) time deterministic algorithm and an O(n logn + m) time Monte Carlo algorithm for the tre...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2010